Low-Cost Enrichment of Spanish WordNet with Automatically Translated Glosses: Combining General and Specialized Models
نویسندگان
چکیده
This paper studies the enrichment of Spanish WordNet with synset glosses automatically obtained from the English WordNet glosses using a phrase-based Statistical Machine Translation system. We construct the English-Spanish translation system from a parallel corpus of proceedings of the European Parliament, and study how to adapt statistical models to the domain of dictionary definitions. We build specialized language and translation models from a small set of parallel definitions and experiment with robust manners to combine them. A statistically significant increase in performance is obtained. The best system is finally used to generate a definition for all Spanish synsets, which are currently ready for a manual revision. As a complementary issue, we analyze the impact of the amount of in-domain data needed to improve a system trained entirely on out-of-domain data.
منابع مشابه
The Spanish version of WordNet 3.0
In this paper we present the Spanish version of WordNet 3.0. The English resource includes the glosses (definitions and examples) and the labelling of senses with WordNet identifiers. We have translated the synsets and the glosses to Spanish and alignment has been carried out at word level, whenever possible. The project has produced two interesting results: we have obtained a bilingual (Spanis...
متن کاملOn the Automatic Enrichment of a Portuguese Wordnet with Dictionary Definitions
Besides synsets and semantic relations, synset glosses are an important feature of wordnets. However, due to the required effort, their creation is sometimes left undone. This happens in Onto.PT, a Portuguese wordnet created automatically, which does not have glosses. In our work, we exploited Portuguese dictionaries to automatically assign definitions to the synsets of Onto.PT. For this purpos...
متن کاملAutomatic Enrichment of WordNet with Common-Sense Knowledge
WordNet represents a cornerstone in the Computational Linguistics field, linking words to meanings (or senses) through a taxonomical representation of synsets, i.e., clusters of words with an equivalent meaning in a specific context often described by few definitions (or glosses) and examples. Most of the approaches to the Word Sense Disambiguation task fully rely on these short texts as a sour...
متن کاملAutomatic Generation of Glosses in the OntoLearn System
OntoLearn is a system for automatic acquisition of specialized ontologies from domain corpora, based on a syntactic pattern matching technique for word sense disambiguation, called structural semantic interconnection (SSI). We use SSI to extract from corpora complex domain concepts and create a specialized version of WordNet. In order to facilitate the task of domain specialists who inspects an...
متن کاملTowards Spanish Verbs' Selectional Preferences Automatic Acquisition: Semantic Annotation of the SenSem Corpus
We present the results of an agreement task carried out in the framework of the KNOW Project and consisting in manually annotating an agreement sample totaling 50 sentences extracted from the SenSem corpus. Diambiguation was carried out for all nouns, proper nouns and adjectives in the sample, all of which were assigned EuroWordNet (EWN) synsets. As a result of the task, Spanish WN has been sho...
متن کامل